Network Multicomputing Using Recoverable Distributed Shared Memory
نویسندگان
چکیده
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose networking technology, in contrast to current distributedmemory multiprocessors where a dedicated special-purpose interconnect is used. The advent of high-speed general-purpose networks provides the impetus for a new look at the network multiprocessor model, by removing the bottleneck of current slow networks. However, major software issues remain unsolved. A convenient machine abstraction must be developed that hides from the application programmer low-level details such as message passing or machine failures. We use distributed shared memory as a programming abstraction, and rollback recovery through consislenl checkpointing to provide fault tolerance. Measurements of our implementations of distributed shared memory and consistent checkpointing show that these abstractions can be implemented efficiently.
منابع مشابه
UsulDSM: A Page-based Recoverable Distributed Shared Memory Project Report
UsulDSM is a page-based recoverable software distributed shared memory system designed for network of computers that don’t have access to a physically shared memory. In this report we describe architecture of the UsulDSM and discuss its design and implementation. We also evaluate its performance through a simple parallel application that uses UsulDSM. We also analyze UsulDSM’s scalability and t...
متن کاملArchitectural Issues in Adopting Distributed Shared Memory for Distributed Object Management Systems
Distributed shared memory (DSM) provides transparent network interface based on the memory abstraction. Furthermore, DSM gives us the ease of programming and portability. Also the advantages ooered by DSM include low network overhead, with no explicit operating system intervention to move data over network. With the advent of high-bandwidth networks and wide addressing, adopting DSM for distrib...
متن کاملAn Extended Coherence Protocol for Recoverable DSM Systems with Causal Consistency
This paper presents a coherence protocol for recoverable Distributed Shared Memory (DSM) systems with causally consistent read-write objects. It uses independent checkpointing tightly integrated with coherence operations. That integration results in high availability of shared objects and ensures fast restoration of the consistent state of DSM in spite of multiple node failures, introducing lit...
متن کاملRecoverable Distributed Shared Memory Using the Competitive Update Protocol
In this paper, we propose a recoverable DSM that uses a competitive update protocol. In this update protocol, multiple copies of each page may be maintainedat different nodes. However, it is also possible fora page to exist in only one node, as some copies of the page may be invalidated. We propose an implementation that makes the competitive update protocol recoverable from a single node failu...
متن کاملReplication for Efficiency and Fault Tolerance in a Dsm System
Distributed Shared Memory (DSM) systems implemented on a network of workstations (NOW) have become a convenient alternative to shared memory archi-tectures to execute long running parallel applications. However, such architectures are susceptible to experience failures. This paper presents the design and implementation of a recoverable DSM (RDSM) based on a backward error recovery (BER) mechani...
متن کامل